300 research outputs found
Learning from Scarce Experience
Searching the space of policies directly for the optimal policy has been one
popular method for solving partially observable reinforcement learning
problems. Typically, with each change of the target policy, its value is
estimated from the results of following that very policy. This requires a large
number of interactions with the environment as different polices are
considered. We present a family of algorithms based on likelihood ratio
estimation that use data gathered when executing one policy (or collection of
policies) to estimate the value of a different policy. The algorithms combine
estimation and optimization stages. The former utilizes experience to build a
non-parametric representation of an optimized function. The latter performs
optimization on this estimate. We show positive empirical results and provide
the sample complexity bound.Comment: 8 pages 4 figure
Policy Improvement for POMDPs Using Normalized Importance Sampling
We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowle ge of the POMDP and allows the experience to be gathered with an arbitrary set of policies. The return is estimated for any new policy of the POMDP. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its theoretical properties. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pair-wise comparisons.We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to the REINFORCE algorithm showing an order of magnitude reduction in the number of trials required
An Electronic Market-Maker
This paper presents an adaptive learning model for market-making under the reinforcement learning framework. Reinforcement learning is a learning technique in which agents aim to maximize the long-term accumulated rewards. No knowledge of the market environment, such as the order arrival or price process, is assumed. Instead, the agent learns from real-time market experience and develops explicit market-making strategies, achieving multiple objectives including the maximizing of profits and minimization of the bid-ask spread. The simulation results show initial success in bringing learning techniques to building market-making algorithms
Search for gamma-ray emission from -wave dark matter annihilation in the Galactic Center
Indirect searches for dark matter through Standard Model products of its
annihilation generally assume a cross-section which is dominated by a term
independent of velocity (-wave annihilation). However, in many DM models an
-wave annihilation cross-section is absent or helicity suppressed. To
reproduce the correct DM relic density in these models, the leading term in the
cross section is proportional to the DM velocity squared (-wave
annihilation). Indirect detection of such -wave DM is difficult because the
average velocities of DM in galaxies today are orders of magnitude slower than
the DM velocity at the time of decoupling from the primordial thermal plasma,
suppressing the annihilation cross-section today by some five orders of
magnitude relative to its value at freeze out. Thus -wave DM is out of reach
of traditional searches for DM annihilations in the Galactic halo. Near the
region of influence of a central supermassive black hole, such as Sgr A,
however, DM can form a localized over-density known as a `spike'. In such
spikes the DM is predicted to be both concentrated in space and accelerated to
higher velocities, allowing the -ray signature from its annihilation to
potentially be detectable above the background. We use the Large Area
Telescope to search for the -ray signature of -wave annihilating DM
from a spike around Sgr A in the energy range 10 GeV-600 GeV. Such a signal
would appear as a point source and would have a sharp line or box-like spectral
features difficult to mimic with standard astrophysical processes, indicating a
DM origin. We find no significant excess of rays in this range, and we
place upper limits on the flux in -ray boxes originating from the
Galactic Center. This result, the first of its kind, is interpreted in the
context of different models of the DM density near Sgr A.Comment: 16 pages, 7 figure
- …